Proceedings Template - WORD
نویسندگان
چکیده
This paper describes an approach to text classification using language models. This approach is a natural extension of the traditional Naïve Bayes classifier, in which we replace the Laplace smoothing by some more sophisticated smoothing methods. In this paper, we tested four smoothing methods commonly used in information retrieval. Our experimental results show that using a language model, we are able to obtain better performance than traditional Naïve Bayes classifier. In addition, we also introduce into the existing smoothing methods an additional factor of smoothing scale according to the amount of training data of the class, and this allows us to further improve the classification performance.
منابع مشابه
Proceedings Template - WORD
Path loss and delay profile models for ITS applications based on the measured data at 700MHz band are presented.
متن کاملProceedings Template - WORD
Preserving privacy while publishing social network data has become a serious issue with the rapid growth of Social Networks. In this work, we propose a perturbation based approach for privacy preserving publication of social network graphs and evaluate the utility aspect of our proposed method using real world dataset.
متن کاملProceedings Template - WORD
This poster presents a computational analysis of conceptual metaphors in a community of political blogs. Like sentiment analysis or opinion extraction, computational metaphor identification can provide a means of understanding the particular framings or conceptualizations used in a community. This poster includes an overview of the implementation and a summary of results.
متن کاملPOC-NLW Template for Chinese Word Segmentation
In this paper, a language tagging template named POC-NLW (position of a character within an n-length word) is presented. Based on this template, a twostage statistical model for Chinese word segmentation is constructed. In this method, the basic word segmentation is based on n-gram language model, and a Hidden Markov tagger based on the POC-NLW template is used to implement the out-of-vocabular...
متن کاملCASRA+: A Colloquial Arabic Speech Recognition Application
The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this...
متن کاملProceedings Template - WORD
Most of the approaches for dealing with uncertainty in the Semantic Web rely on the principle that this uncertainty is already asserted. In this paper, we propose a new approach to learn and reason about uncertainty in the Semantic Web. Using instance data, we learn the uncertainty of an OWL ontology, and use that information to perform probabilistic reasoning on it. For this purpose, we use Ma...
متن کامل